---
name: flowio
description: Parse Flow Cytometry Standard (FCS) files v2.0–3.1 and extract events/metadata for preprocessing workflows (e.g., when you need NumPy arrays, channel info, or CSV/DataFrame export from cytometry files).
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to read FCS v2.0/3.0/3.1 files and extract event matrices for downstream preprocessing.
- You want to inspect or validate FCS metadata (TEXT segment) without loading event data (memory-efficient parsing).
- You need channel definitions (PnN/PnS), ranges (PnR), and automatic identification of scatter/fluorescence/time channels.
- You need to handle problematic FCS files with offset inconsistencies or multi-dataset content.
- You want to export cytometry events to CSV/Pandas DataFrame or write new/modified FCS files.

## Key Features

- **FCS parsing (v2.0–3.1):** Reads HEADER/TEXT/DATA and optional ANALYSIS segments.
- **Event extraction to NumPy:** Returns event data as `ndarray` with shape `(events, channels)`.
- **Optional preprocessing:** Applies standard FCS transformations (gain/log/time scaling) when enabled.
- **Metadata access:** Exposes TEXT keywords and common instrument/acquisition fields.
- **Channel utilities:** Provides PnN/PnS labels, ranges, and indices for scatter/fluorescence/time channels.
- **Robust parsing options:** Flags for offset discrepancy handling and null-channel exclusion.
- **Multi-dataset support:** Detects and reads files containing multiple datasets.
- **FCS writing:** Create new FCS files from arrays and optionally preserve/override metadata.

## Dependencies

- `python >= 3.9`
- `flowio` (install via pip/uv; version depends on your environment)
- Example-only:
  - `numpy >= 1.20`
  - `pandas >= 1.5`

## Example Usage

```python
"""
End-to-end example:
1) Read an FCS file (metadata + events)
2) Convert to a Pandas DataFrame and export CSV
3) Filter events and write a new FCS file
4) Handle multi-dataset files
"""

from pathlib import Path

import numpy as np
import pandas as pd

from flowio import (
    FlowData,
    create_fcs,
    read_multiple_data_sets,
    MultipleDataSetsError,
    FCSParsingError,
    DataOffsetDiscrepancyError,
)

FCS_PATH = "sample.fcs"

def read_fcs_safely(path: str) -> FlowData:
    try:
        return FlowData(path)
    except DataOffsetDiscrepancyError:
        # Common workaround for files with inconsistent offsets
        return FlowData(path, ignore_offset_discrepancy=True)
    except FCSParsingError:
        # Looser mode if the file is malformed
        return FlowData(path, ignore_offset_error=True)

def main() -> None:
    # --- 1) Read file (single dataset) ---
    try:
        flow = read_fcs_safely(FCS_PATH)
    except MultipleDataSetsError:
        # --- 4) Multi-dataset handling ---
        datasets = read_multiple_data_sets(FCS_PATH)
        flow = datasets[0]  # pick the first dataset for this demo

    print("File:", getattr(flow, "name", Path(FCS_PATH).name))
    print("FCS version:", flow.version)
    print("Events:", flow.event_count)
    print("Channels:", flow.channel_count)
    print("PnN labels:", flow.pnn_labels)

    # Metadata (TEXT segment)
    print("Instrument ($CYT):", flow.text.get("$CYT", "N/A"))
    print("Acquisition date ($DATE):", flow.text.get("$DATE", "N/A"))

    # --- 2) Events -> NumPy -> DataFrame -> CSV ---
    events = flow.as_array(preprocess=True)  # default preprocessing behavior
    df = pd.DataFrame(events, columns=flow.pnn_labels)
    df.to_csv("events.csv", index=False)
    print("Wrote CSV:", "events.csv")

    # --- 3) Filter and write a new FCS ---
    # Example: threshold on first scatter channel if available, else channel 0
    fsc_idx = flow.scatter_indices[0] if getattr(flow, "scatter_indices", []) else 0
    threshold = np.percentile(events[:, fsc_idx], 50)  # median threshold
    mask = events[:, fsc_idx] > threshold
    filtered = events[mask]

    create_fcs(
        "filtered.fcs",
        filtered,
        flow.pnn_labels,
        opt_channel_names=flow.pns_labels,
        metadata={**flow.text, "$SRC": "Filtered via FlowIO example"},
    )
    print("Wrote FCS:", "filtered.fcs")

    # --- Metadata-only read (memory efficient) ---
    meta_only = FlowData(FCS_PATH, only_text=True)
    print("Metadata-only read: $DATE =", meta_only.text.get("$DATE", "N/A"))

if __name__ == "__main__":
    main()
```

## Implementation Details

### Data Model and Segments

An FCS file is organized into segments:

- **HEADER:** FCS version and byte offsets for other segments.
- **TEXT:** Keyword/value metadata (e.g., `$DATE`, `$CYT`, `$PnN`, `$PnS`, `$PnR`, `$PnG`, `$PnE`).
- **DATA:** Event matrix encoded as integer/float/double/ASCII depending on file keywords.
- **ANALYSIS (optional):** Post-processing results if present.

In FlowIO, these are exposed via `FlowData` attributes such as:
- `flow.header` (HEADER info)
- `flow.text` (TEXT keyword dictionary)
- `flow.analysis` (ANALYSIS keyword dictionary, if present)
- `flow.as_array(...)` (decoded event matrix)

### Preprocessing (`as_array(preprocess=True)`)

When preprocessing is enabled, FlowIO applies common FCS transformations:

1. **Gain scaling (PnG):** Values are multiplied by the per-parameter gain.
2. **Log/exponential transform (PnE):** If present, applies:
   - `value = a * 10^(b * raw_value)` where `PnE = "a,b"`.
3. **Time scaling:** If a time channel is detected, values may be scaled into appropriate units.

To disable all transformations and obtain raw decoded values:
- `flow.as_array(preprocess=False)`

### Channel Identification

FlowIO provides convenience indices for common channel types:

- `flow.scatter_indices` (e.g., FSC/SSC)
- `flow.fluoro_indices` (fluorescence channels)
- `flow.time_index` (time channel index or `None`)

These indices can be used to slice the event matrix:
- `events[:, flow.scatter_indices]`
- `events[:, flow.fluoro_indices]`

### Handling Problematic Files (Offsets and Null Channels)

Some files contain inconsistent offsets between HEADER and TEXT:

- `ignore_offset_discrepancy=True` to tolerate HEADER/TEXT offset mismatch.
- `use_header_offsets=True` to prefer HEADER offsets.
- `ignore_offset_error=True` to bypass offset-related failures more aggressively.

To exclude known null/empty channels during parsing:
- `FlowData(path, null_channel_list=[...])`

### Multi-Dataset Files

If a file contains multiple datasets, constructing `FlowData(path)` may raise `MultipleDataSetsError`. Use:

- `read_multiple_data_sets(path)` to load all datasets, or
- `FlowData(path, nextdata_offset=...)` to load a specific dataset using `$NEXTDATA` offsets.

### Writing FCS

Two common patterns:

- **Write metadata-only changes:** `flow.write_fcs("out.fcs", metadata={...})`
- **Modify event data:** extract array → modify → `create_fcs(...)` to generate a new file (FlowIO does not modify event data in-place).